Title Generation for Machine-Translated Documents
نویسندگان
چکیده
In this paper, we present and compare automatically generated titles for machine-translated documents using several different statistics-based methods. A Naïve Bayesian, a K-Nearest Neighbour, a TF-IDF and an iterative Expectation-Maximization method for title generation were applied to 1000 original English news documents and again to the same documents translated from English into Portuguese, French or German and back to English using SYSTRAN. The AutoSummarization function of Microsoft Word was used as a base line. Results on several metrics show that the statisticsbased methods of title generation for machinetranslated documents are fairly language independent and title generation is possible at a level approaching the accuracy of titles generated for the original English documents.
منابع مشابه
Automatic Title Generation using EM
Our prototype automatic title generation system inspired by statistical machine-translation approaches [1] treats the document title like a translation of the document. Titles can be generated without extracting words from the document. A large corpus of documents with human-assigned titles is required for training title “translation” models. On an f1 evaluation score our approach outperformed ...
متن کاملJapanese Term Extraction Using Dictionary Hierarchy and Machine Translation System
There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, they focus on a mono-lingual term extraction method. Therefore, it is difficult to extract terms from documents in foreign languages. This paper describes an automatic term extraction method from documents in foreign languages using a machine translation system. In our method, we trans...
متن کاملInferring Location Names for Geographic Information Retrieval
For the participation of GIRSA at the GeoCLEF 2007 task, two innovative features were introduced to the geographic information retrieval (GIR) system: identification and normalization of location indicators, i.e. text segments from which a geographic scope can be inferred, and the application of techniques from question answering. In an extension of a previously performed experiment, the latter...
متن کاملMulti-Document Summarization Using Cross-Language Texts
Without a summarization system in source language, we try to generate a summary in source language, using translated documents by a machine translator and a summarization system in target language. For summarizing multiple documents translated by a machine translator, we extract important sentences, and remove redundant sentences using an improved term-weighting method. It assigns weights to wo...
متن کاملImproving Statistical Machine Translation Using Comparable Corpora
Title of dissertation: Improving Statistical Machine Translation Using Comparable Corpora Matthew Garvey Snover, Doctor of Philosophy, 2010 Dissertation directed by: Professor Bonnie Dorr Department of Computer Science With thousands of languages in the world, and the increasing speed and quantity of information being distributed across the world, automatic translation between languages by comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001